Comprehensive modulation representation for automatic speech recognition

نویسندگان

  • Yadong Wang
  • Steven Greenberg
  • Jayaganesh Swaminathan
  • Ramdas Kumaresan
  • David Poeppel
چکیده

We present a new feature representation for speech recognition based on both amplitude modulation spectra (AMS) and frequency modulation spectra (FMS). A comprehensive modulation spectral (CMS) approach is defined and analyzed based on a modulation model of the band-pass signal. The speech signal is processed first by a bank of specially designed auditory band-pass filters. CMS are extracted from the output of the filters as the features for automatic speech recognition (ASR). A significant improvement is demonstrated in performance on noisy speech. On the Aurora 2 task the new features result in an improvement of 23.43% relative to traditional mel-cepstrum front-end features using a 3 GMM HMM back-end. Although the improvements are relatively modest, the novelty of the method and its potential for performance enhancement warrants serious attention for future-generation ASR applications.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Dimensionality Reduction and Improving the Performance of Automatic Modulation Classification using Genetic Programming (RESEARCH NOTE)

This paper shows how we can make advantage of using genetic programming in selection of suitable features for automatic modulation recognition. Automatic modulation recognition is one of the essential components of modern receivers. In this regard, selection of suitable features may significantly affect the performance of the process. Simulations were conducted with 5db and 10db SNRs. Test and ...

متن کامل

A Database for Automatic Persian Speech Emotion Recognition: Collection, Processing and Evaluation

Abstract   Recent developments in robotics automation have motivated researchers to improve the efficiency of interactive systems by making a natural man-machine interaction. Since speech is the most popular method of communication, recognizing human emotions from speech signal becomes a challenging research topic known as Speech Emotion Recognition (SER). In this study, we propose a Persian em...

متن کامل

Automatic speech emotion recognition using modulation spectral features

In this study, modulation spectral features (MSFs) are proposed for the automatic recognition of human affective information from speech. The features are extracted from an auditory-inspired long-term spectro-temporal representation. Obtained using an auditory filterbank and a modulation filterbank for speech analysis, the representation captures both acoustic frequency and temporal modulation ...

متن کامل

Spectro-temporal Gabor features as a front end for automatic speech recognition

A novel type of feature extraction is introduced to be used as a front end for automatic speech recognition (ASR). Two-dimensional Gabor filter functions are applied to a spectro-temporal representation formed by columns of primary feature vectors. The filter shape is motivated by recent findings in neurophysiology and psychoacoustics which revealed sensitivity towards complex spectro-temporal ...

متن کامل

Designing and implementing a system for Automatic recognition of Persian letters by Lip-reading using image processing methods

For many years, speech has been the most natural and efficient means of information exchange for human beings. With the advancement of technology and the prevalence of computer usage, the design and production of speech recognition systems have been considered by researchers. Among this, lip-reading techniques encountered with many challenges for speech recognition, that one of the challenges b...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005